


DOCOHEIT HSSUflE 



BD 050 553 



BB 008 913 



AUTHOR 

TITLE 

INSTITUTION 

SPONS AGENCY 

PUB DATE 
NOTE 



Hansen, Duncan N» ; And Others 
Review of Automated Testing* 

Florida state Univ* , Tallahassee* Computer- Assisted 
Instruction Center* 

Office of Faval Research,, Washington, D* C. Personnel 
and Training Research Programs Office* 

26 Feb 71 

37p.; CAI Center Tech Berno Number 30 



FURS PRICE ED RS Price BF-fO.65 HC-S3.29 

DESCRIPTORS ♦Automation, ♦Computer Oriented Programs, Computers, 

Input Output, Input Output Devices, ♦Literature 
Reviews, Models, Personality Assessment, 
Psychological Evaluation, ♦Psychological Testing, 
Student Testing, ♦Testing, Test Interpretation, Test 
Reliability, Test Scoring Machines, Time Sharing 

IDS TRACT 

A review of the literature on automated 
psychological testing shows that aost research and development in 
this area is based on a one-test, one-psychologist model* In this 
nodel, the functions of test administration, scoring, and 
interpretation are thought out in terms of specific tests presented 
on an individualized basis. However, more complete and sophisticated 
psychological assessment can take place vith a multi-test, multi-team 
model* This model makes extensive use of time-sharing, interactive, 
terminal-oriented computer systems* Bet hodological investigations and 
research and development vork on automated testing in the last decade 
are reviewed in terms of the three dimensions of testing: 
administration, scoring, and interpretation. A computerized 
information management system for the storage and retrieval of 
student evaluation files appears to be necessary* Such a system would 
allow varying and appropriate reports to be generated for 
psychologists, teachers, counselors, etc* The capability of computers 
to analyze and accept natural language input must be further 
developed before testing can become fully automated* (JX) 






'''■ ■' » '/• VX.3 /. / '•/t v U'-i-y^A *... i j’s* :.i' 

Duncan N. Hansen, *?onn 0. Hedl, Jr., and Harold F. O'Kell, Jr 

: , . ; , mi*' U-y-< ifi< The Florida State University wiyj,) .uv’> . 

/•/.' J r .* r. itj , '. 'b'.AivJjj 

iTeth Hwe: No’. 30 

; ■ :•■ i r V ^ February 26, 19/1 ^ 

.. ;• ,»..r % V %•■ ;£.••* ■ J v .!|V?>X>4 *\ int <:<•. ,uo v£ J ■>* "* rr/yr<’W ; ' ' iJ&-% ■ 'Jo*. MW ■ 

•• v' : •: *£.'• M*'hr*jept nr 154-280 ^W;. h ^* 

>?> ./ ■ Sponsored by- '•••■ " ; '?t* t . 

• W-lc:- <>*:■■< Personnel 4 Training tamv-ch Programs 

’■ ‘ Psychological Sciences Division \ . 

■ ■ . ' i s > - - - ■ AlldeJi A# Uftue 1 ^<eai^r>K ^ •. • >. 



to 



W0?.c 

f/v ■£ ;* 



I'itift' j‘j>* 



4 .. , Office Of Naval research ■/ 
. ..Arlington, Virginia 
;/ Contract no. N0C014-63-A-0494 



1^.: 



a* \y:.\ .a- v - ‘ 



1 : i;.. » /s' 



Its distribution Is unlisted. . -v / 

.'V- j- r:;/i * ’* * ‘ ;‘i>- • . /■. ■ -; 

;*v , 5jSC , a, iri.'tj 1 ' •»' • 1 '■ v 1 . //) ’V. :■ , ‘ j _ a..*. _ t V!L . n. _ 



V 



... 2 



■■SIX' 

. ■-.> A 






'hhW-s 



'r^A ; 



.«* *-V : ' * • y ' $?< ; ’ :• S‘. t ' 

■ ' '• . 

, Vj-i.'v •. Vi ! •.',V\'. • ■’&»* f • •; .-' , 



■:'Mi M-filti fe -^^h : 

Ji$ .. .t‘4 ... ■..;■»• 

/'V v ? ■ • ■ •:'?••-«'• I'.V^sl be J;j?f : • . »J j; • , * V . ■ t 



jjy^' • 

Tech Memo Series 



The FSCTrCAI Center Tech Memo Series Is intended ; 
to provide uo mmmilo atlon to other colleagues and interested 
professionals who are ectl.ely utilizing computers" fcfthelr 
research. Ths rationale fpr the Tech Meino.flerlps Is three- 
fold. First j pilot otudfes tib»t show great prcthlae and will 
: eventuate tn itesWi^Jrspdrts can W glyeh |i ^ulck distribu- 
tion. Secondly, speeches given at professionat meetings can 
be distributed for broad review and reaction. Third, the 
Tech Memo Series provides for distribution of pro -publication 



slonal journals. 

• V in terms of substance, these reports will be concise, 

\ descriptive ,and papl^ afory In nal vre. While, oast within a. >;’ijC 
■; CAI research model, '-a number of the reports will deal with 
• technical implementation topics related to computers and V • : 
V their language or operating eye tops. Thus, we here at FSU 
trust this Tech Memo SCfles serve a useful service and 
- oomaaalbatlon for other woteo/te in the ar*'. of computers 
and educa ti on. Any comments so the authors can be forwarded 
A 7la the Florida State Unlra?si*y CA* Center. \ 



ft> dytif.'&l fill fnf (<•.? ’£■ futft ?..:{$*! (. S.V'"V 

« . * • - w 1 ■ : r v ./ 

Sr? '<'S $»V fxmican N. Hansen 

Wtocfor •••••';:: : ' . 

v ' ' A; (tenter 

i »ts< iaic.M.H t?si7fc»Vih'A»'&% .'I, ; h*r 













Security Classification 



DOCUMENT CONTROL DATA - R 4 D 
(Security classification of title, body of abstract and Indexing annotation 

My IM ♦ k A a J I.iUa n 4 L A AIIAU% 1 1 UAHAui 4 m a 1 4 ft a J 4J A J 1 



T ORIGINATING ACTIVITY (Corporate author} 
Florida State University 
Computer-Assisted Instruction Center 
Tallahassee, Florida 



~T. REPORT TITLE 



2a. 


REPORT SECURITY 








W7 


GROUP 





Review of Automated Testing 



BESCAIPTIVe NOTES (Type of report and Inclusive dates) 
Technical Memo' No. 50, February 26, 1971 

I* Ti SUTHOR(S) (First name, middle initial, last name} 

! Duncan N. Hansen, John J. Hedl, Jr., and Harold F. O'Neil, Jr. 



I 



i 



laT 



b. 

c. 

d. 



j 10. 



REPORT DAVE 
February 26, 1971 


7a. TOTAL NO. OF PAGES - 
24 


7b. NO. OF REFS 
41 


CONTRACT OR GfcAKT NO. 
NOOO 14-68- A-04 94 
PROJECT NO. 

NR 154-280 


9a. "ORIGINATOR' S REPORT NUMBER(S) 

— 


9b. OTHER REPORT N0(S) (Any other numbers 
that may be assigned this report) 

t 



DISTRIBUTION STATEMENT 

This document has been approved for public release and sale; 
Its distribution Is unlimited. 



rrr: supplementary notes 
13. ABSTRACT 



li. SPONSORING MiTTTARY ACTIVITY 
Personnel 4 Training Research Program 
Office of Naval Research 
.Arlington . Virginia 



The primary purpose of this paper was to review the background literature 
on automated psychological testing. In this respect, R a 0 efforts were 
discussed within the traditional evaluation model Involving test adminis- 
tration, test scoring, and test Interpretation. A more Inclusive model 
of the assessment process Is discussed which reveals future possibilities 
for conputer applications. Preliminary specifications and required 
developmental activities needed to operationalize this multi -test multi - 
professional assessmant model are outlined within the framework of a 
psycho-educational Information management system. 



oij iortM 1473 (PAGE 1) 

1 NOV 65 

S/N 0101-E07-6811 Security Classification 

A- 3 1408 

O 

ERIC 



; ■'.***‘■3*# , 



3 



r 
i ■ 






7 ?: 



SzcukaMi ffi&AlftiCAC?/}* '' v 



key mom 



■ |l H— 1 1 . 



— rwniMi 


THO 






mr 


wr 


ROLF 


wr 



DO 



W 



_■ I NOV £5 1473 
rn ® S/H 0W-M7-4I1I 

:HJC 



iworr 



: ,1 ^ < x k . 



TecuvUt/ ^ZaTTCficaiZon 
A-3U09 



4 






<(wr«Kv*w«w ^w.^H.4rw«»« kX «W «3*OTV»E« *P’PPB»*&79?* ' f 



gg DEPARTMENT OF HEALTH. EDUCATION 

ft WELFARE 

OFFICE OF IOUC/ TION 
THIS OOCl'MENT HAS BEEN BEPBO^CED 
EXACTLY AS RECEIVED FRO M THE PERSON OR 

organization originating i: points of 

VIEW Ofl OPINONS STATED DO NOT NECCS 
SARILY RlPRESE r rr OFFICIAL OFF ICC OF ED J 
CATION POSITION' OR POLICY 



REVIEW OF AUTOMATED TESTING 



Duncan N. Hansen, John J. Hedl , Jr., and Harold F. O'Neil, Jr. 
The Florida State University 



.lath Memo: No. 30 
February 25 , 1971 



Project NR 154-230 
Sponsored by 

Personnel S Training Research Programs 
Psychological Sciences Division 
Office Of Naval Research 
Arlington, Virginia 
Contract No. N00014-68-A-0494 

This document has been approved for public release and sale; 
Its distribution Is unlimited. 

Reproduction In Whole or in Part Is Permitted for ar\y Purpose 
of the United States Government. 






5 



SWr****.-. 



o 

ERIC 



W+Vt+T HW »> f- 






ABSTRACT 

The primary purpose of this paper was to review the background 
literature on automated psychological testing. In this respect, R & D 
efforts were discussed within the traditional evaluation model Involving 
test administration, test scoring, and test Interpretation. A more 
Inclusive model of the assessment process Is discussed which reveals 
future possibilities for computer applications. Preliminary specifica- 
tions and required developmental activities needed to operationalize 
this multi-test multi -professional assessment model are outlined within 
the framework of a psycho-educational Information management system. 
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REVIEW OF AUTOMATED TESTING 



Duncan N, Hansen, John J. Iledl, Jr;, and Harold F. O'Neil, Jr. 
Florida State University 



Introduction 

The actie investigation of the use of automated equipment for 
psychological testing spans the past decade. Numerous forces have 
contributed to this active investigation of the methodological require- 
ments to automate psychological testing. First, and foremost, the 
amount of psychological and educational evaluation has increased many 
orders of magnitude. It is quite conmon to find both state and 
national testing programs as wall as increased psychological and 
guidance services being executed within most major school systems. 
Secondly, there Is an ever increasing demand for professional manpower 
which grossly fails to match the requirements for diagnostic and 
evaluative assessment (Amhoff, 1968; Boneau, 1966a; Boneau, 1968b). 
Lastly, our assessment programs are becoming much more sophisticated 
in the sense of using multiple tests and preparing more sophisticated 
reports which have more prescriptive characteristics in terms of affect- 
ing the fulure course of a student's passage through our 'ducational 
enterprise. 

. , In regard to the methodological Investigations, review of the 

* V ’ • 1 ' ' - : k •? ' * I 1 ■ ■ t * • ' 1 • 1 r r / ' 

. , literature indicates that the predominant model has been the one test- 
one psychologist focus. In essence, the functions of test administration, 
scoring, and interpretation have been conceptualized, analyzed and 

1 * ' ' ' ‘ * *■ r "l , 
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explored In terms of specific tests presented on an individualized bis is . 

As will be pointed out in this oaper, there are some serious problems 
found in such a limited model of the one test-one psychologist (OTOP) 
focus. The major deficiencies have been threefold in nature. First, 
the goal of increased sophisticated psychological ‘assessment has contri- 
buted to the growing trend for the use of test batteries with multiple 
requirements ranging from cognitive through personality assessment; this 
trend Is obviously counter to the OTOP approach. Second, the OTOP model 
more directly relates to the clinical approach whico has an operational 
deficiency In terms of bridging the hiatus between diagnostic assessment 
and prescriptive guidance. Lastly, we would conjecture! that methodological 
Investigations of the OTOP model are far too constrained In that the oppor- 
tunity to consider the full domain of a multi -test, multi-team (MTMT) 
psychological testing service opens up many new possibilities for the use 
of time-sharing Interactive terminal -oriented computing systems. 

During the past decade, the team model fer multi phasic psychologi- 
cal testing and educational Intervention has become a more predominant 
theme. Psychologists, counselors, teachers, and professionals ere realiz- 
ing the need for an extension of the diagnostic, interpretation, and 
Intervention process. Thus, one could conjecture that the MTMT model will 
lead to a better representation of the psychological assessment process. 
Primary considerations of this model Involve Information gathering and 
processing of specified behaviors, critical decisions based on the most 
reliable and valid behavioral samples, and, most Importantly, the collation 
of this data for the generation of alternative hypotheses regarding the 
Interpretation and Implied educational treatments to be offered. The 
MTMT model offers a broader context In which to adequately evaluate the 










3 

' potential use of computer resources to reduce the manpower requirements 
arid to extend the sophistication of the psycho-educational testing 
process. 

We turn now to a consideration of the methodological investi- 
gations of automated testing and their associated R and D problems 
carried out in the last decade. The paper will be or;ani.zed to cover the 
domains of test administration, test scoring, and test interpretation, 
host importantly, o strong emphasis will be placed on the information 
processing and multi-functional characteristics implied by the MTMT model 
so that a broader range of R and D Isiues and subgoals can be considered. 

Test Administratio n 

Automated test administration concerns the interaction between 
the student and the automated equipment being used for the test presen- 
tation. There appear to be four areas of methodological activity in 
this area: 1) terminal equipment, 2) the interactive testing process, 

3) reliability and validity Issues, and 4) the collection of multiple 
response indices. 

* '• ■' :V ’ Terminal Equipment . In reference to the avail 'hi lity of auto- 
mated terminal equipment, It is quite cornnon to find typewriters, 
cathode ray tubes, and slide projectors being used for test item presen- 
tation. Since the creation of inexpensive terminal equipment is one of 
' ■ the dynamic areas in computer technology, one can anticipate more 
' sophisticated terminal devices as well as a significant decrease In the 
•■’ v ' cost. On the other hand, progress with respect to the operation of 
appropriate audio presentation units and natural speech analyzers has 
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been discouraging. Although digitalized speech as well as speech analysis 
devices a ro being Investigated at Stanford and Haskins Laboratory respec- 
tively, the generic problems involved in natural speech analysis are 
delaying developments of new equipment. 

In regard to osycho-motor/mani pulati ve presentations, cost seems 
to t-3 one of the greatest deterrents to any extensive development. It 
should be anticipated, though, that this may be overcome within the 
coming decade. 

Interactive Test Process . Turning to the characteristics of the 
student-terminal interaction, several investigators have provided indirect 
evidence that this man-machine dialogue may be characterized as unbiased, 
non-stressful , and nearly human In nature. For example, Smith (1963) points 
to a "confession machine effect" which appears to enhance the data acquisition 
In particular content areas such as the subject's personal experience or his 
perceived personality characteristics. Evans and Hiller (1969) found that 
students responded with greater honesty and candor to highly personal Items 
of a social science questionnaire when administered by a computer as opposed 
to a conventional administration. Cogswell and Estavan (1965) have also 
reported similar findings on the apparent confidentiality of the computer 
interview. i . r 

This neutral nature of the computer evaluation experience may 
also be Inferred from CAI research dealing with Trait-State Anxiety Theory 

(Splelberger, Lushene, arid MgAPOo, 1971). In this CAI anxiety research, a 

( 

conceptual distinction .Is made between state anxiety, which consists of 
feelings of apprehension that vary In Intensity and fluctuate over tlma, 
and trait anxiety which refers to Individual differences In anxiety proneness. 
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In two studies (O'Neil, Spielberger, & Hansen, 1969; O'Neil, Hansen, & 
Spielberger, 1969) the CAT learning experience did not seem to differ- 
entially affect state anxiety responses for high and low trait anxiety 
Ss, although there was a significantly higher response by high trait 
anxiety Ss. An analysis of the CAI situation revealed a possible 
explanation for the absence of any relationship between trait anxiety and 
differential Increases in state anxiety within this CAI setting. In the 
CAI task, the computer did not evaluate the adequacy of the S’s perform- 
ance relative to others, and therefore, did not pose a threat to self- 
este.r. These two studies, because they did not find differential shifts 
on A-State results for low and high trait anxious is, lend Indirect 
evidence for the implied impersonal nature of a computer task. 

More direct evidence for the non-threatening nature of a computer 
based evaluation comes from a study by Gallagher (1970). He Investigated 
the relationship of instructional treatments and learner characteristics 
in a terminal oriented computer-managed instruction course. Computer 
evaluation and Instructor evaluation of term projects resulted in sorre 
rather Interesting findings. Trait anxiety scores were negatively 
related to performance (r = -.51) in the Instructor evaluated group, 
but were not related In the computer evaluated group (r ° -.03). If one 
assumes that the treatment group which emphasised human Interaction 
(Instructor-evaluated group) would result In a greater threat to the 
Individual's self-estrem, then these results would be consistent with 
Trait-State Anxiety Theory. In addition, these results provide some 
evidence that the interactive computer process may be less threatening, 
and, therefore, .nay be more neutral In nature, at least In the situations 



studied to date. 
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Reliability and Validity . In addition to these considerations, 
computer-based evaluation may have important reliability and validity 
Implications. Computer-based administration of psychological tests should 
increase the reliability and validity of the test Information due to the 
more neutral features of Its Interaction. Since the computer may be concept- 
ually objective and neutral, its use to administer tests should eliminate 
certain possible human biases resulting from the typical dyadic Interaction 
between examiner and student. The reduction of these aff'Ctlve error 
variance components sliould lead to Increased reliability of the tests 
(Cronbach, 1960). 

Reliability and validity studies concerning automated adminis- 
tration procedures have demonstrated from an empirical standpoint, the 
feasibility of a technological approach and have paved the way for 
further research and developmental efrorts. Kor example, Elwood (1969) 
developed a non -computerized automated testing booth to administer the 
Wefchsler Adult Intelligence Scale (WAIS). Orr (1969) reported favorable 
results for this approach from a comparison of an automated WAIS presen- 
tation With a traditional WAIS presentation (r - .93). However, this 
system only provides scoring capabilities for 2 of the 11 subtests 
(Digit Span and Digit Symbol). Recent computer methodology (Hedl, 

O'Neil, & Hansen, 1971} to be reported In an associated paper will 
describe how the administration of Intel 11ger.ce test Items can be pro- 
gramed to allow for repetition and expansion of verbal responses. Tt. 
more contingent, Interactive elicitation of responses appears to yield 
equivalent reliability and validity Indices to those found for human 
presentation. 1 : 
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In a study of computer-based sequential testing, Hansen (196$) 
found a significant Improvement in internal consistency reliability for 
computer presentation (r a .80) in comparison with a conventional class- 
room achievement test (r = .43). More interestingly, the computer-based 
test yielded a significant relationship (r a .76) with a college entrance 
aptitude score. 

Parenthetically, one is surprised at the sparseness of the studies 
that directly compare reliability and validity of computer approaches with 
conventional administration. Obviously, considerable empirical study 
remains to be performed. 

Multiple Response Collection . In reference: to multiple response 
collection, the MMPI research at Florida State University (Dunn, lushene, 

& O'Neil, 1971) represents an attempt at the total automation of the MMPI. 
The inventory items are presented on a cathode-ray tube. Latency Is 
recorded as the student responds to each item. Immediately following the 
completion of the tests, the system prints out its Interpretation of the 
data. These latency results w J ll be reported later in an associated 
report. 

As a part of the computer-based sequential test, Hansen (1969) 

' found the addition of subjective confidence responses yielded improved 
‘ validity coefficients. Massengill and Schuford (1967) have reported 
similar results. Obviously, the full potential of multiple dependent 
measures remalrs to be empirical!* explored within automated testing, 
j.'-.c-y . The R & D efforts concerning the automation of psychological 
testing have focused essentially on the OTOP model. In essence, these 
n research applications attempted to simulate standard clinical testing 
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procedures. A standard psychometric test was automated In terms of 
test administration and the results were then compared with traditional 
testing procedures. Alt Sr ugh most of the results have demonstrated the 
feasibility of the com r ter meti'.cdolc jy, the research has been limited 
In scope. For example, there has been no att impt to develop test items 
specifically fo»* a computer-based apprct.ch. Given the Increases In 
psychological assessment problems In our nation's spools, broader 
conception and implementation of computer testing applications are 
needed to extend the diagnostic interpretation and Intervention process. 

On the other hand, the goal of the MTMT model Is to expedite the 
Information gathering of psychological and cognitive data to provide 
for sufficient Intervention and treatment program*. This goal can only 
be achieved through a broader conception of the assessment process. 

First, research should focus on the computer aspects centering around 
Input and output of natural language during on-line communication 
between the student and the system. Starkweather {1965), -Colby, Watt, 

& Gilbert (1966), and Weizenbaum (1966) have devaloped computer techniques 
to conduct psychotherapeutic dialogues with patients. These natural 
language processing techniques could be utilized to extend and enrich the 
Interviewing and test-interactive aspects of a test battery. Hedl, O'Neil, 
& Hansen (1071) have shown that an Interactive dialogue Is possible .With 
the automated administration of an Individualized Intelligence test. 

A second emphasis Implied by the MTMT model would be the determi- 
nation of the optimal psychologist-computer-student Interaction. Questions 
of student interest and motivation are of primary concern here. Efficient 
and reliable data gathering can only be achieved If the student pieces the 
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appropriate confidence in the psychologist and the computer. In essence, 
one needs to plan and study from a systems viewpoint the adaptive aspects 
of the total assessment process. 

Third, the number and variety cf psycho-educational and psycholog- 
ical tests to implement within the MTMT model would, of necessity, need to 
be quite extensive. In addition, the decisions for test administration 
should possibly stress the Increased use of subtest scales within test 
batteries. Specific findings determined from an initial test battery 
could be inwediately followed up with in-depth evaluation to more precisely 
determine the nature and scope of a particular aptitude o v disability. 

This multi-testing procedure reveals new possibilities for 
computer applications in the assessment process. It could extend the 
variety of information available on a student and provide the differen- 
tial data for the psychologist, teacher, and counselor. Given that the 
information needs are different for these professionals, the concept of 
the multi-test battery approach dictates the need for precise determi- 
nation of the Information requirements for each professional. Thus, 
an automated approach could allow for far greater flexibility in the 
composition of the test battery as well as possibly individualized sub- 
test sequences that would maximize motivation ahd adaptation by the stu- 
dent. 1 Obviously, these Issues flowing from the MTMT model remain to be 
’"’investigated.' ‘‘ " ■■■-. 

oil-! fC I>.-L ..or. . . . 

Automated Scorin g 

The case of an automated approach to test scoring appears to 
' vary along a structured/unstructured response dimension. For example, 

O 

ERLC 



15 

i 

f 7 






10 

multiple-choice test item formats can be considered highly structured 
andi therefore, extremely easy to computer process using either optical 
scanners or on-line terminals. On the other hand, natural language Inputs 
are quite unstructured as to vocabulary and grammatical characteristics 
as well as semantic content, and thus are more difficult to process. 

This structured/unstructured dimension has been identified In 
order to provide a framework by which to consider the methodological pro- 
cess found In automated scoring techniques. This section will briefly 
mention conventional test scoring via optical scanners and then evaluate 
the research developments in natural language processing of verbal responses, 
use of multiple Index scores, and finally sequential testing. 

Test Scoring . Although the employment of computers to calculate 

r 

test scores an d to carry out statistical analyses and summaries of test 
data has been common for many years, the volume has been growing at a 
considerable rate. The adven* of test scoring machines and the more 
sophisticated optical scanners has provided commercial testing services 
such as Educational Testing Service, Measurement Research Center, 

California Test Fureau, Science Research Associates, etc. witi the 
capability for processing millions of student tests. Woods (1970) pre- 
sents a conprehenslve survey of the gener .1 uses of such data processing 
techniques In school testing programs. However, the application of these 
response analysis techniques to on-line terminal oriented computer testing 
systems Is a recent advance. We U»;« now to the consideration of the 
use of natural language processing for test responses. 

Natural Language Processing . One of the most significant develop- 
ments for the analysis of language has been the General Inquirer System, 
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a system of computer programs for content analysis of English tests 
(Stone, Ounphy, Smith, & Ogilvie, 1966). Using special "dictionaries" of 
words precategorized for specific research purpose., the system auto- 
matically tallies frequencies of category usage for a body of text material. 
Tht materials which have been analyzed range from suicide notes (Stone 
et al., 1966) to Thematic Apperception Test narratives (Smith, 1968). 
Bhusham and Ginther (1968) have reported using this system to analyze 
essays. 

Most applications of the General Inquirer have ignored the prob- 
lems of syntax. Goldberg (1966), for instance, applied the system to 
' sentence completions with some success. Other researchers in the field 
of automated content analysis have evaded syntax problems by restricting 
the responses of the subject in one manner or another. In developing 
a computer-based system for scoring responses to the Holtzman Ink-Blot 
Test, Gorham (1967) restricted subjects t'« the use of six words for 
each blot. Even with this restriction, the correlations between hand 
and computer scoring equalled or exceeded fnterscprer reliability for 
the computer scoring for 15 of the 17 variables. 

Peck and Veldman (1961) of the University of Texas have been 
developing * computer-based system for presenting and scoring responses 
to a sentence completion test. The problems of syntax were reduced due 
' to the restriction on the subject to use a single word In responding to 
each sentence stem. The most recent system (Veldman, 1967) produces 
40 scores from a 36-item form and employs a complex word-root data 
reduction system. This prototypic tailored inquiry method offers many 
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of the benefits of a traditional Interview, and might serve as a basis 
of future programs which could conduct intensive assessment interviews. 

Recently, Archambault (1970) developed a computerized program to 
score verbal responses to three of the seven subtests of the Torrance 
Tests of Creative Thinking. The subtests considered were the Ask and 
Guess subtests (Activities 1, 2, and 3) in which subjects ask questions 
about a drawing and make guesses about the causes and consequences of a 
pictured event. Subject responses to each of these sub tests are scored 
for fluency, flexibility, and originality. 

For each of these categories a dictionary of entries was con- 
structed by analyzing the model responses given by Torrance for key words 
and phrases in Roget's International Thesaurus (1962) and Soule's 
Di ctionary of English Synon yms (1966). The test was administered in tra- 
ditional fashion and the student responses were keypunched on standard I8H 
cards, one response to a card. These responses were then analyzed in 
a batch process mode. A word/phrase lookup procedure was performed to 
determine the frequency of categories which were used. 

Archambault 1 s data indicated that creativity, as defined by 
Torrance, was Judged accurately by a computer. The syntax problems 
were reduced by only analyzing the frequency of word usage. However, 
this frequency word usage or word phrase look up procedure produced sig- 
nificant correlator ranging from .52 to .99 between the computer and 
the pooled scores of four trained Judges. It appears that the use of 
a computer to score open-ended responses to standardized test items is 
feasible and should be further investigated. 
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The above-mentioned studies employed word dictlonaires for their 
natural language programs. Essentially, the input data was compared 
against the dictionary entries in order to detect tne presence or absence 
of certain word usage categories. Based upon the occurrence or non- 
occurrence of matche* wUh the dictionary, scoring and branching decisions 
were made concerning the students' responses. The tests were administered 
in traditional fashion and the resultant data were then key-punched and 
analyzed In a batch process mode. The responses were not evaluated on a 
ret -time basis. The automated Slosson Intelligence Test (Hedl , et. al . , 1971) 
also employs a word dictionary approach; however, the input responses are 
iimiedlately analyzed for their correctness. 

One of the major problems in implementing computer analysis of 
natural language pertains to an economically feasible input system. This 
difficulty should be solved with the development of better interactive 
terminal devices and time-sharing computing systems. 

Multiple Index Scores 

The Interactive testiig approach exemplified by the two follow- 
ing papers Illustrates new dimensions in the analysis of heretofore 
unexamined respo?ise characteristics. Multiple dependent measures such 
as latency, subjective confidence, and anxiety can be Incorporated to 
Improve both the diagnostic power and efficiency of the psychometric 
Instruments. Research with the WPI {Dqnn, et al., 1971) has shown that 

* r J ■ f ''0 s ' a / r ft •> * : . * *•’ •* 

the Information processing time (latency) for a given Item Is partially 
a function of the number of characters In the Item, the ambiguity of the 

1 ■■ <|.(J T-;; " • ’ 

Item, and the social desirability value of the Item. Massengill and 

■* «*» Zn'i « / . 3ir,vf ! ; ... 

Schuford (1967) have shown that subjective confidence ratings significantly 
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Increase test reliability. Hansen (1969) reported an Improved predictive 
relationship for a college entrance aptitude measure if confidence scores 
are included with the right/wrong CAI scores. 

Confidence or subjective probability scores may have great 
potential for Improving diagnostic procedures, in that this additional 
subjective Information approximates more closely many clinical assessment 
procedures. Moreover, a procedure for calculating factor scores recommended 
by Cattell (1965) could be Implemented within the overall system. 

Sequential Testing . As on-line scoring becomes more frequently 
utilized, the concept of sequential testing Is likely to become part of 
the scoring methodology. Sequential testing Is a procedure by which the 
selection of each Item is contingent on the prior performance. In addition, 
subtest sequences can be altered according to real-time behavioral data 
samples, and according to the objective of the testing procedure as speci- 
fied by the psychologist, teacher, or counselor. Sequential selection of 
tests to be administered can also be Incorporated with the overall 
secern. In this respect, sequential testing Is necessary to solve the 
logistic problems presented by Implementation of an MTMT model that 
strives for In-depth differential student assessment. The concept of 
mass test administrations would be eliminated (Cleary, Linn, & Rock, 

1968) by a wldesqale adoption of this procedure with the MTMT model. 

Sequential testing Is also being employed for criterion per- 
formance assessment within Individually prescribed Instruction (IPI). 
Ferguson (1970) has described a model for computer-assisted crlterion- 

referenced testing. The essential assumption of the approach Is a 

Oi'i- i r'i'fi'. j;-..! To •,<!•■- :-ik 

' hlerarctficat-sequence of skill performance levels. Items are presented 
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within a given skill area until sufficient Information is available to 
formulate a mastery or non-mastery decision on the particular skill. 

The Pittsburgh IPI project is currently utilizing this form of sequential 
testing to facilitate the assessment/management aspects of their instruc- 
tional program.: -v • .* 

In summary, as methodological advances occur in natural language 
processing, in multiple dependent measures for combination or factor 
scores, and in sequential testing, the potential of the MTMT model will 
become a reality. In essence, the full array pf student scores will be 
stored In learning history vectors and become an operational component In 
the educational process. . 

As developments In natural language processing become more 
sophisticated, the structured vs. unstructured distinction of response 
■ processing will not be a major consideration. Natural language processing 
v and multiple dependent measures will become Integrated in the student's 
; ■ score file and ultimately far more useful In the Instructional process. 

Automated Interpretation 

• •• . ■..Ji" <'»;■; .tv.ry . ' :* t .• :• , ■ 

The challenge of automated Interpretation of test results con- 
sists of converting quantitative Indices or profiles into meaningful 
verbal statements. While the RAO effort In this area is quite limited, 

- one can foresee a great need for methodological development because of the 
extensive manpower required to provide for this phase of the testing 
- ■ 3 process. As to reasons for the limited RAO efforts to date, one 
'i should recognize that an essential characteristic of a psychologist's role 
consists of providing hunan dialogue and Interpretation regarding the 
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outcomes of the testing process. Moreover, the interpretation of quantitative 
scores has always been a problem, due to the lack of sophistication of 
the varying clientele audience. In turn, generating professionally appro- 
priate interpretations for psychological colleague?, guidance counselors, 
classroom teachers, and parents varies as to ooth the depths of interpre- 
tation as well as the use of quantitative concepts. Given these reasons 
for the limited progress In automated Interpretation of test results, this 
section will review the major progress in the personality domain because of 
the more substantial methodological progress that has been demonstrated in 
comparison with the aptitude area. A brief discussion of preliminary research 
In the aptitude and achievement area Is made. The section will conclude with 
a review of beginning efforts to develop an information management system 
for test result Interpretations. 

Personality Test Interpretation. The first operational system for 
the MMPI was developed at the Mayo Clinic (Rome, Srfenson, Mataya, McCarthy, 
Pearson, and Keating, 1962) for routine use on medical and surgical patients. 
Glueck and Reznikoff (1965) have modified the Mayo program for application 
to a psychiatric In-patient population. More complex scoring and Interpre- 
tative systems for the MMPI have been developed by Finney (1967) and 
Fowler (1969). 

A nunfcer of less- than -complete Interpretative efforts have been 
made in that many programs are available to provide Interpretive state- 
ments based upon some limited aspect of the profile or to examine the 
test scores for congruence with seme specified profile type. Thus, there 
are programs to examine MMPI scores for the GilbersUdt-Duker and Marks - 
Seeman code types, to apply the Meehl-Oahl strom profile discrimination 
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rules, or to identify maladjusted college students generally (Kleinmuntz, 
1963). These lead to category descriptions if a student is positively 
identified. 

The above mentioned programs Involve both scoring and interpre- 
tative routines. In contrast, the Rorschach Test has only an Interpre- 
tative system to analyze the obtained scores (Plotrowski, 1964). Agreement 
found between program and clinical diagnosis was 86 percent. 

Essentially, both the MMPI ai.-d Rorschach programs examine the con- 
Iguratlon of certain test scales or scores and then locate appropriate 
sentences or paragraphs stored In the computer memory system depending 
upon the scale elevations. The interpretative statements are then combined 
and a report produced. 

Recent efforts by Fowler with the MMPI exemplify the concept of 
variable Interpretative reports that are Intended for different but 
specifiable audiences. Unlike his earlier work, and the work of others 
In the Interpretation research area, which dealt extensively with clinical 
Interpretation of score profiles, Fowler Is currently designing a program 
to write varied psychological reports depending upon the nature of the 
Intended audience. Ir.ipllclt In this. work Is the- need for a concise 
specification of the Informational needs of the personnel who will eventui- 
a,lly read, piocess, and further act upon the Interpreted results. Using 
an audience rating methodological approach, each vers. on Is up-dated accord 
Ing to readability, audience relevancy, and professional utility criteria. 
One can anticipate that these methodological techniques will be utilized 
to extend the automated interpretive efforts of the future. 
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Aptitud e Test Interpretation. Two examples of purely Interpre- 
tative type programs for aptitude and achievement tests are available. 

These systems require test scores as Input and provide for minimal Inter- 
pretation of the patterns of scores. Within the area of aptitude and 
achievement measures. Helm (1965) has programmed the evaluation of a battery 
of Individual scores per student. Sixty-five classes of sentences were 
generated from written psychological reports. The rule classifications 
Incorporated approximately 90 percent of the Information In the psychologi- 
cal reports. The output report consisted basically of simple sentences 
designed as direct translations of scores although some provision was made 
for compound sentences to handle contrasts or similarities between two or 
more profile scores. 

' In the area of counseling, Cogswell and Estavan (19; ') have developed 
a program to evaluate student folders containing such Input information as 
grades, aptitude test scores, etc. Applying the rules derived from previous 
counselor judgments,' the computer program would select appropriate output 
statements such as: "Students grades have gone down quite a bit. Ask 
about this In an Interview. - , There was 75 percent agreement between the 
computer statements and the evaluative behavior of two counselors. 

Information Management Systems . In stressing the multi -test 

.. t i . r . *■ . 

multi -professional approach to assessment, an Information management 
system (IMS) for storage and retrieval of student evaluation files appears 
to be necessary. In this way, varying but appropriate reports can be 

'U-‘ J f ’ * “ * 

generated for psychologists, teachers, counselors, etc. Implicit In the 
KTMT model Is the conception of a continuous record system with automated 
Interpretative capability. All too often, the school psychologist or 
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classroom teacher perceives instructional problem cases within the frame- 
work of symptom disorders, either achievement or psychological In nature. 

A totally automated diagnostic system with Interpretive capability could 
be preventative In nature In that continuous Information would be available 
on each student and would be processed by the appropriate personnel at 
their level of Information capability. 

This IMS should alro be able to suggest treatment possibilities for 
Identified problem disorders. In addition, probabilistic statements could 
be presented concerning possible causative or treatment alternatives for 
each student. A constant cybernetic approach to the IMS would up-date the 
current Interpretation and treatment statements. In other words, the effect 
of different treatments would be stored In the IMS and compared to the 
previous predictions for the purpose of actuarial up-dating. Thus, more 
valid and yet more precise statements of diagnostic and Instructional 
activities would be readily available. One can anticipate that R and D 
efforts In automated Interpretation cf test results will follow the trend 
towards Incorporation within IMS developments. 



Summary 

Given the rapid distribution of computer terminals, one can antici- 
pate extensive empirical automated testing research during the 70's. We 
contend that the trends found In the MTMT model will Influence those efforts. 
We anticipate extensive efforts on the natural language, dialogue aspect of 
test adnlnistratlo.i. Both test scoring and Interpretation will be Influ- 
enced by the growing availability of IMS for education. Thus, this decade 
will undoubtedly represent the full flowering of the automated testing area. 
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